Azure chaos STUDIO

Chaos engineering is fun but especially important when building solutions in the cloud. It is great leveraging the cloud to build something, whether that’s a globally distributed website with lots of traffic or an internal 3 tier application for a business – the question is – what happens is there is an unexpected fault / disruption? Can your system / app withstand the issue?

I will quote principleofchaos.org

“A systems-based approach addresses the chaos in distributed systems at scale and builds confidence in the ability of those systems to withstand realistic conditions. We learn about the behaviour of a distributed system by observing it during a controlled experiment. We call this Chaos Engineering.” How does this connect with Azure, well there is a preview feature called Azure Chaos Studio which is quite simply a managed service that uses chaos engineering to help you measure, understand, and improve your cloud application and service resilience. You can read this link for more details https://learn.microsoft.com/en-us/azure/chaos-studio/chaos-studio-overview

There are 2 types of faults you can use a service vs agent based one. The service based one runs directly against an Azure resource, without any installation or instrumentation whereas Agent based is when faults run inside the VM and does in-guest failures. Being that I like Data tech I looked into the Cosmos DB experiment which looked like a realistic test, We know its easy to build a solution with Cosmos especially a multi region read / single write one but how would your application behave if there was a failover? I don’t think you would want to wait for this to happen in real world mode? Sure you could manually do this but experimentation helps with automating the process and downstream analysis. The idea with this tool is to:

Build the experiment > Add a relevant fault > Add the target to test > Give the experiment permissions > Run it.

Check think from Microsoft which goes through building an experiment – https://learn.microsoft.com/en-us/azure/chaos-studio/chaos-studio-quickstart-azure-portal

An experiment can have multiple steps and branches to it and you really build it out.

Not every Azure resource can be tested yet but I am sure Microsoft will keep adding to it.

Check this link to see if your required resource can be used – This could be from testing Azure VM CPU (agent fault based) pressure to a Azure Redis reboot (service fault). It would be nice to see an Azure SQL DB experiment to see how our application would handle a failover to a different region.

https://learn.microsoft.com/en-us/azure/chaos-studio/chaos-studio-fault-library

Enjoy.

1 thought on “Azure chaos STUDIO

  1. Pingback: Tales From The Field Weekly Wrap Up for the Week of 06-12-2023 Cooking, Father's Day, & the Shrinking Household – SQLServerCentral

Leave a Reply